Predicting the Outcome of Cricket Matches

Introduction

In this project, we shall build a model which predicts the outcome of cricket matches in the Indian Premier League using data about matches and deliveries.

Data Mining:

Season : 2008 - 2015 (8 Seasons)
Teams : DD, KKR, MI, RCB, KXIP, RR, CSK (7 Teams)
Neglect matches that have inconsistencies such as No Result, Tie, D/L Method, etc.

Possible Features:

Average Batsman Rating (top 5)
Average Bowler Rating (top 4)
Player of the match frequency
Previous Encounter - Win by runs, Win by Wickets
Recent form (Last 5 Games)
Venue - Home, Away, Neutral



In [47]:

    
# The %... is an iPython thing, and is not part of the Python language.
# In this case we're just telling the plotting library to draw things on
# the notebook, instead of on a separate window.
%matplotlib inline 
#this line above prepares IPython notebook for working with matplotlib

# See all the "as ..." contructs? They're just aliasing the package names.
# That way we can call methods like plt.plot() instead of matplotlib.pyplot.plot().

import numpy as np # imports a fast numerical programming library
import scipy as sp #imports stats functions, amongst other things
import matplotlib as mpl # this actually imports matplotlib
import matplotlib.cm as cm #allows us easy access to colormaps
import matplotlib.pyplot as plt #sets up plotting under plt
import pandas as pd #lets us handle data as dataframes
#sets up pandas table display
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
import seaborn as sns #sets up styles and gives us more plotting options
from __future__ import division

Data Mining



In [48]:

    
# Reading in the data
allmatches = pd.read_csv("../data/matches.csv")
alldeliveries = pd.read_csv("../data/deliveries.csv")
allmatches.head(10)









    Out[48]:







  
    
      
      id
      season
      city
      date
      team1
      team2
      toss_winner
      toss_decision
      result
      dl_applied
      winner
      win_by_runs
      win_by_wickets
      player_of_match
      venue
      umpire1
      umpire2
      umpire3
    
  
  
    
      0
      1
      2008
      Bangalore
      2008-04-18
      Kolkata Knight Riders
      Royal Challengers Bangalore
      Royal Challengers Bangalore
      field
      normal
      0
      Kolkata Knight Riders
      140
      0
      BB McCullum
      M Chinnaswamy Stadium
      Asad Rauf
      RE Koertzen
      NaN
    
    
      1
      2
      2008
      Chandigarh
      2008-04-19
      Chennai Super Kings
      Kings XI Punjab
      Chennai Super Kings
      bat
      normal
      0
      Chennai Super Kings
      33
      0
      MEK Hussey
      Punjab Cricket Association Stadium, Mohali
      MR Benson
      SL Shastri
      NaN
    
    
      2
      3
      2008
      Delhi
      2008-04-19
      Rajasthan Royals
      Delhi Daredevils
      Rajasthan Royals
      bat
      normal
      0
      Delhi Daredevils
      0
      9
      MF Maharoof
      Feroz Shah Kotla
      Aleem Dar
      GA Pratapkumar
      NaN
    
    
      3
      4
      2008
      Mumbai
      2008-04-20
      Mumbai Indians
      Royal Challengers Bangalore
      Mumbai Indians
      bat
      normal
      0
      Royal Challengers Bangalore
      0
      5
      MV Boucher
      Wankhede Stadium
      SJ Davis
      DJ Harper
      NaN
    
    
      4
      5
      2008
      Kolkata
      2008-04-20
      Deccan Chargers
      Kolkata Knight Riders
      Deccan Chargers
      bat
      normal
      0
      Kolkata Knight Riders
      0
      5
      DJ Hussey
      Eden Gardens
      BF Bowden
      K Hariharan
      NaN
    
    
      5
      6
      2008
      Jaipur
      2008-04-21
      Kings XI Punjab
      Rajasthan Royals
      Kings XI Punjab
      bat
      normal
      0
      Rajasthan Royals
      0
      6
      SR Watson
      Sawai Mansingh Stadium
      Aleem Dar
      RB Tiffin
      NaN
    
    
      6
      7
      2008
      Hyderabad
      2008-04-22
      Deccan Chargers
      Delhi Daredevils
      Deccan Chargers
      bat
      normal
      0
      Delhi Daredevils
      0
      9
      V Sehwag
      Rajiv Gandhi International Stadium, Uppal
      IL Howell
      AM Saheba
      NaN
    
    
      7
      8
      2008
      Chennai
      2008-04-23
      Chennai Super Kings
      Mumbai Indians
      Mumbai Indians
      field
      normal
      0
      Chennai Super Kings
      6
      0
      ML Hayden
      MA Chidambaram Stadium, Chepauk
      DJ Harper
      GA Pratapkumar
      NaN
    
    
      8
      9
      2008
      Hyderabad
      2008-04-24
      Deccan Chargers
      Rajasthan Royals
      Rajasthan Royals
      field
      normal
      0
      Rajasthan Royals
      0
      3
      YK Pathan
      Rajiv Gandhi International Stadium, Uppal
      Asad Rauf
      MR Benson
      NaN
    
    
      9
      10
      2008
      Chandigarh
      2008-04-25
      Kings XI Punjab
      Mumbai Indians
      Mumbai Indians
      field
      normal
      0
      Kings XI Punjab
      66
      0
      KC Sangakkara
      Punjab Cricket Association Stadium, Mohali
      Aleem Dar
      AM Saheba
      NaN



In [49]:

    
# Selecting Seasons 2008 - 2015
matches_seasons = allmatches.loc[allmatches['season'] != 2016]
deliveries_seasons = alldeliveries.loc[alldeliveries['match_id'] < 518]



In [50]:

    
# Selecting teams DD, KKR, MI, RCB, KXIP, RR, CSK
matches_teams = matches_seasons.loc[(matches_seasons['team1'].isin(['Kolkata Knight Riders', \
'Royal Challengers Bangalore', 'Delhi Daredevils', 'Chennai Super Kings', 'Rajasthan Royals', \
'Mumbai Indians', 'Kings XI Punjab'])) & (matches_seasons['team2'].isin(['Kolkata Knight Riders', \
'Royal Challengers Bangalore', 'Delhi Daredevils', 'Chennai Super Kings', 'Rajasthan Royals', \
'Mumbai Indians', 'Kings XI Punjab']))]
matches_team_matchids = matches_teams.id.unique()
deliveries_teams = deliveries_seasons.loc[deliveries_seasons['match_id'].isin(matches_team_matchids)]
print "Teams selected:\n"
for team in matches_teams.team1.unique():
    print team









    



Teams selected:

Kolkata Knight Riders
Chennai Super Kings
Rajasthan Royals
Mumbai Indians
Kings XI Punjab
Royal Challengers Bangalore
Delhi Daredevils



In [51]:

    
# Neglect matches with inconsistencies like 'No Result' or 'D/L Applied'
matches = matches_teams.loc[(matches_teams['result'] == 'normal') & (matches_teams['dl_applied'] == 0)]
matches_matchids = matches.id.unique()
deliveries = deliveries_teams.loc[deliveries_teams['match_id'].isin(matches_matchids)]



In [52]:

    
# Verifying consistency between datasets
(matches.id.unique() == deliveries.match_id.unique()).all()









    Out[52]:





True

Building Features



In [53]:

    
# Team Strike rates for first 5 batsmen in the team (Higher the better)

def getMatchDeliveriesDF(match_id):
    return deliveries.loc[deliveries['match_id'] == match_id]

def getInningsOneBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].batsman.unique()[0:5]

def getInningsTwoBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].batsman.unique()[0:5]

def getBatsmanStrikeRate(batsman, match_id):
    onstrikedeliveries = deliveries.loc[(deliveries['match_id'] < match_id) & (deliveries['batsman'] == batsman)]
    total_runs = onstrikedeliveries['batsman_runs'].sum()
    total_balls = onstrikedeliveries.shape[0]
    if total_balls != 0: 
        return (total_runs/total_balls) * 100
    else:
        return None


def getTeamStrikeRate(batsmen, match_id):
    strike_rates = []
    for batsman in batsmen:
        bsr = getBatsmanStrikeRate(batsman, match_id)
        if bsr != None:
            strike_rates.append(bsr)
    return np.mean(strike_rates)

def getAverageStrikeRates(match_id):
    match_deliveries = getMatchDeliveriesDF(match_id)
    innOneBatsmen = getInningsOneBatsmen(match_deliveries)
    innTwoBatsmen = getInningsTwoBatsmen(match_deliveries)
    teamOneSR = getTeamStrikeRate(innOneBatsmen, match_id)
    teamTwoSR = getTeamStrikeRate(innTwoBatsmen, match_id)
    return teamOneSR, teamTwoSR



In [54]:

    
# Testing Functionality
getAverageStrikeRates(517)









    Out[54]:





(126.98024523159935, 128.55579510411653)



In [55]:

    
# Bowler Rating : Wickets/Run (Higher the Better)
# Team 1: Batting First; Team 2: Fielding First

def getInningsOneBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].bowler.unique()[0:4]

def getInningsTwoBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].bowler.unique()[0:4]

def getBowlerWPR(bowler, match_id):
    balls = deliveries.loc[(deliveries['match_id'] < match_id) & (deliveries['bowler'] == bowler)]
    total_runs = balls['total_runs'].sum()
    total_wickets = balls.loc[balls['dismissal_kind'].isin(['caught', 'bowled', 'lbw', \
    'caught and bowled', 'stumped'])].shape[0]
    if total_runs != 0:
        return (total_wickets/total_runs) * 100
    else:
        return total_wickets

def getTeamWPR(bowlers, match_id):
    totalWPRs = []
    for bowler in bowlers:
        totalWPRs.append(getBowlerWPR(bowler, match_id))
    return np.mean(totalWPRs)

def getAverageWPR(match_id):
    match_deliveries = getMatchDeliveriesDF(match_id)
    innOneBowlers = getInningsOneBowlers(match_deliveries)
    innTwoBowlers = getInningsTwoBowlers(match_deliveries)
    teamOneWPR = getTeamWPR(innTwoBowlers, match_id)
    teamTwoWPR = getTeamWPR(innOneBowlers, match_id)
    return teamOneWPR, teamTwoWPR



In [56]:

    
#Testing Functionality 
getAverageWPR(517)









    Out[56]:





(2.7641806594085776, 4.4721111768026631)



In [57]:

    
# Man of the Match Awards for players of both Teams 

def getInningsOneAllBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].batsman.unique()

def getInningsTwoAllBatsmen(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].batsman.unique()

def getInningsOneAllBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 2].bowler.unique()

def getInningsTwoAllBowlers(match_deliveries):
    return match_deliveries.loc[match_deliveries['inning'] == 1].bowler.unique()

def getTeam(batsmen,bowlers):
    p = []
    p = np.append(p, batsmen)
    for i in bowlers:
        if i not in batsmen:
            p = np.append(p, i)
    return p

def getPlayerMVPAwards(player, match_id):
    return matches.loc[(matches["player_of_match"] == player) & (matches['id'] < match_id)].shape[0]

def getTeamMVPAwards(team, match_id):
    mvpAwards = 0
    for player in team:
        mvpAwards = mvpAwards + getPlayerMVPAwards(player,match_id)
    
    return mvpAwards

def bothTeamMVPAwards(match_id):
    matchDeliveries = getMatchDeliveriesDF(match_id)
    innOneBatsmen = getInningsOneAllBatsmen(matchDeliveries)
    innTwoBatsmen = getInningsTwoAllBatsmen(matchDeliveries)
    innOneBowlers = getInningsTwoAllBowlers(matchDeliveries)
    innTwoBowlers = getInningsOneAllBowlers(matchDeliveries)
    team1 = getTeam(innOneBatsmen, innTwoBowlers)
    
    team2 = getTeam(innTwoBatsmen, innOneBowlers)
    team1Awards = getTeamMVPAwards(team1,match_id)
    team2Awards = getTeamMVPAwards(team2,match_id)
    return team1Awards, team2Awards



In [58]:

    
#Testing Functionality
bothTeamMVPAwards(517)









    Out[58]:





(28, 52)



In [59]:

    
#Function to generate squad rating 

def generateSquadRating(match_id):
    gameday_teams = deliveries.loc[(deliveries['match_id'] == match_id)].batting_team.unique()
    teamOne = gameday_teams[0]
    teamTwo = gameday_teams[1]
    teamOneSR, teamTwoSR = getAverageStrikeRates(match_id)
    teamOneWPR, teamTwoWPR = getAverageWPR(match_id)
    teamOneMVPs, teamTwoMVPs = bothTeamMVPAwards(match_id)
    print "Comparing squads for {} vs {}".format(teamOne,teamTwo)
    print "\nAverage Strike Rate for Batsmen in {} : {}".format(teamOne,teamOneSR)
    print "\nAverage Strike Rate for Batsmen in {} : {}".format(teamTwo,teamTwoSR)
    print "\nBowler Rating (W/R) for {} : {}".format(teamOne,teamOneWPR)
    print "\nBowler Rating (W/R) for {} : {}".format(teamTwo,teamTwoWPR)
    print "\nNumber of MVP Awards in {} : {}".format(teamOne,teamOneMVPs)
    print "\nNumber of MVP Awards in {} : {}".format(teamTwo,teamTwoMVPs)



In [60]:

    
#Testing Functionality
generateSquadRating(517)









    



Comparing squads for Mumbai Indians vs Chennai Super Kings

Average Strike Rate for Batsmen in Mumbai Indians : 126.980245232

Average Strike Rate for Batsmen in Chennai Super Kings : 128.555795104

Bowler Rating (W/R) for Mumbai Indians : 2.76418065941

Bowler Rating (W/R) for Chennai Super Kings : 4.4721111768

Number of MVP Awards in Mumbai Indians : 28

Number of MVP Awards in Chennai Super Kings : 52



In [61]:

    
## 2nd Feature : Previous Encounter
# Won by runs and won by wickets (Higher the better)

def getTeam1(match_id):
    return matches.loc[matches["id"] == match_id].team1.unique()

def getTeam2(match_id):
    return matches.loc[matches["id"] == match_id].team2.unique()

def getPreviousEncDF(match_id):
    team1 = getTeam1(match_id)
    team2 = getTeam2(match_id)
    return matches.loc[(matches["id"] < match_id) & (((matches["team1"].isin(team1)) & (matches["team2"].isin(team2))) | ((matches["team1"].isin(team2)) & (matches["team2"].isin(team1))))]
def getTeamWBR(match_id, team):
    WBR = 0
    DF = getPreviousEncDF(match_id)
    winnerDF = DF.loc[DF["winner"] == team]
    WBR = winnerDF['win_by_runs'].sum()    
    return WBR


def getTeamWBW(match_id, team):
    WBW = 0 
    DF = getPreviousEncDF(match_id)
    winnerDF = DF.loc[DF["winner"] == team]
    WBW = winnerDF['win_by_wickets'].sum()
    return WBW 
    
def getTeamWinPerc(match_id):
    dF = getPreviousEncDF(match_id)
    timesPlayed = dF.shape[0]
    team1 = getTeam1(match_id)[0].strip("[]")
    timesWon = dF.loc[dF["winner"] == team1].shape[0]
    if timesPlayed != 0:
        winPerc = (timesWon/timesPlayed) * 100
    else:
        winPerc = 0
    return winPerc

def getBothTeamStats(match_id):
    DF = getPreviousEncDF(match_id)
    team1 = getTeam1(match_id)[0].strip("[]")
    team2 = getTeam2(match_id)[0].strip("[]")
    timesPlayed = DF.shape[0]
    timesWon = DF.loc[DF["winner"] == team1].shape[0]
    WBRTeam1 = getTeamWBR(match_id, team1)
    WBRTeam2 = getTeamWBR(match_id, team2)
    WBWTeam1 = getTeamWBW(match_id, team1)
    WBWTeam2 = getTeamWBW(match_id, team2)

    print "Out of {} times in the past {} have won {} times({}%) from {}".format(timesPlayed, team1, timesWon, getTeamWinPerc(match_id), team2)
    print "{} won by {} total runs and {} total wickets.".format(team1, WBRTeam1, WBWTeam1)
    print "{} won by {} total runs and {} total wickets.".format(team2, WBRTeam2, WBWTeam2)



In [62]:

    
#Testing functionality 
getBothTeamStats(517)









    



Out of 21 times in the past Mumbai Indians have won 11 times(52.380952381%) from Chennai Super Kings
Mumbai Indians won by 144 total runs and 30 total wickets.
Chennai Super Kings won by 138 total runs and 31 total wickets.



In [63]:

    
#3rd Feature: Recent Form (Win Percentage of 3 previous matches of a team in the same season)
#Higher the better

def getMatchYear(match_id):
    return matches.loc[matches["id"] == match_id].season.unique()

def getTeam1DF(match_id, year):
    team1 = getTeam1(match_id)
    return matches.loc[(matches["id"] < match_id) & (matches["season"] == year) & ((matches["team1"].isin(team1)) | (matches["team2"].isin(team1)))].tail(3)

def getTeam2DF(match_id, year):
    team2 = getTeam2(match_id)
    return matches.loc[(matches["id"] < match_id) & (matches["season"] == year) & ((matches["team1"].isin(team2)) | (matches["team2"].isin(team2)))].tail(3)

def getTeamWinPercentage(match_id):
    win = 0
    total = 0
    year = int(getMatchYear(match_id))
    team1 = getTeam1(match_id)[0].strip("[]")
    team2 = getTeam2(match_id)[0].strip("[]")
    team1DF = getTeam1DF(match_id, year)
    team2DF = getTeam2DF(match_id, year)
    team1TotalMatches = team1DF.shape[0]
    team1WinMatches = team1DF.loc[team1DF["winner"] == team1].shape[0]
    team2TotalMatches = team2DF.shape[0]
    team2WinMatches = team2DF.loc[team2DF["winner"] == team2].shape[0]
    if (team1TotalMatches != 0) and (team2TotalMatches !=0):
        winPercTeam1 = ((team1WinMatches / team1TotalMatches) * 100) 
        winPercTeam2 = ((team2WinMatches / team2TotalMatches) * 100) 
    elif (team1TotalMatches != 0) and (team2TotalMatches ==0):
        winPercTeam1 = ((team1WinMatches / team1TotalMatches) * 100) 
        winPercTeam2 = 0
    elif (team1TotalMatches == 0) and (team2TotalMatches !=0):
        winPercTeam1 = 0
        winPercTeam2 = ((team2WinMatches / team2TotalMatches) * 100) 
    else:
        winPercTeam1 = 0
        winPercTeam2 = 0
        
    return winPercTeam1, winPercTeam2
    
    
def displayTeamWin(match_id):
    year = int(getMatchYear(match_id))
    team1 = getTeam1(match_id)[0].strip("[]")
    team2 = getTeam2(match_id)[0].strip("[]")
    P,Q = getTeamWinPercentage(match_id)
    print "In the season of {}, {} has a win percentage of {}% and {} has a win percentage of {}% ".format(year, team1, P, team2, Q)



In [64]:

    
#Function to implement all features
def getAllFeatures(match_id):
    generateSquadRating(match_id)
    print ("\n")
    getBothTeamStats(match_id)
    print("\n")
    displayTeamWin(match_id)



In [65]:

    
#Testing Functionality
getAllFeatures(517)









    



Comparing squads for Mumbai Indians vs Chennai Super Kings

Average Strike Rate for Batsmen in Mumbai Indians : 126.980245232

Average Strike Rate for Batsmen in Chennai Super Kings : 128.555795104

Bowler Rating (W/R) for Mumbai Indians : 2.76418065941

Bowler Rating (W/R) for Chennai Super Kings : 4.4721111768

Number of MVP Awards in Mumbai Indians : 28

Number of MVP Awards in Chennai Super Kings : 52


Out of 21 times in the past Mumbai Indians have won 11 times(52.380952381%) from Chennai Super Kings
Mumbai Indians won by 144 total runs and 30 total wickets.
Chennai Super Kings won by 138 total runs and 31 total wickets.


In the season of 2015, Mumbai Indians has a win percentage of 66.6666666667% and Chennai Super Kings has a win percentage of 66.6666666667%

Adding Columns



In [66]:

    
#Create Column for Team 1 Winning Status (1 = Won, 0 = Lost)

matches['team1Winning'] = np.where(matches['team1'] == matches['winner'], 1, 0)









    



/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until



In [67]:

    
#New Column for Difference of Average Strike rates (First Team SR - Second Team SR) [Negative value means Second team is better]

firstTeamSR = []
secondTeamSR = []
for i in matches['id'].unique():
    P, Q = getAverageStrikeRates(i)
    firstTeamSR.append(P), secondTeamSR.append(Q)
firstSRSeries = pd.Series(firstTeamSR)
secondSRSeries = pd.Series(secondTeamSR)
matches["Avg_SR_Difference"] = firstSRSeries.values - secondSRSeries.values









    



/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # Remove the CWD from sys.path while we load stuff.



In [68]:

    
#New Column for Difference of Wickets Per Run (First Team WPR - Second Team WPR) [Negative value means Second team is better]

firstTeamWPR = []
secondTeamWPR = []
for i in matches['id'].unique():
    R, S = getAverageWPR(i)
    firstTeamWPR.append(R), secondTeamWPR.append(S)
firstWPRSeries = pd.Series(firstTeamWPR)
secondWPRSeries = pd.Series(secondTeamWPR)
matches["Avg_WPR_Difference"] = firstWPRSeries.values - secondWPRSeries.values









    



/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # Remove the CWD from sys.path while we load stuff.



In [69]:

    
#New column for difference of MVP Awards (Negative value means Second team is better)

firstTeamMVP = []
secondTeamMVP = []
for i in matches['id'].unique():
    T, U = bothTeamMVPAwards(i)
    firstTeamMVP.append(T), secondTeamMVP.append(U)
firstMVPSeries = pd.Series(firstTeamMVP)
secondMVPSeries = pd.Series(secondTeamMVP)
matches["Total_MVP_Difference"] = firstMVPSeries.values - secondMVPSeries.values









    



/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # Remove the CWD from sys.path while we load stuff.



In [70]:

    
#New column for win percentage of Team1 in previous encounter 

firstTeamWP = []
for i in matches['id'].unique():
    WP = getTeamWinPerc(i)
    firstTeamWP.append(WP)
firstWPSeries = pd.Series(firstTeamWP)
matches["Prev_Enc_Team1_WinPerc"] = firstWPSeries.values









    



/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [71]:

    
#New column for Recent form(Win Percentage in the current season) of 1st Team compared to 2nd Team(Negative means 2nd team has higher win percentage)

firstTeamRF = []
secondTeamRF = []
for i in matches['id'].unique():
    K, L = getTeamWinPercentage(i)
    firstTeamRF.append(K), secondTeamRF.append(L)
firstRFSeries = pd.Series(firstTeamRF)
secondRFSeries = pd.Series(secondTeamRF)
matches["Total_RF_Difference"] = firstRFSeries.values - secondRFSeries.values









    



/Users/gursahej/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # Remove the CWD from sys.path while we load stuff.



In [72]:

    
#Testing 
matches.tail(20)









    Out[72]:







  
    
      
      id
      season
      city
      date
      team1
      team2
      toss_winner
      toss_decision
      result
      dl_applied
      winner
      win_by_runs
      win_by_wickets
      player_of_match
      venue
      umpire1
      umpire2
      umpire3
      team1Winning
      Avg_SR_Difference
      Avg_WPR_Difference
      Total_MVP_Difference
      Prev_Enc_Team1_WinPerc
      Total_RF_Difference
    
  
  
    
      489
      490
      2015
      Mumbai
      2015-05-01
      Mumbai Indians
      Rajasthan Royals
      Rajasthan Royals
      field
      normal
      0
      Mumbai Indians
      8
      0
      AT Rayudu
      Wankhede Stadium
      HDPK Dharmasena
      CK Nandan
      NaN
      1
      -13.536521
      0.097342
      5
      60.000000
      -33.333333
    
    
      490
      491
      2015
      Bangalore
      2015-05-02
      Kolkata Knight Riders
      Royal Challengers Bangalore
      Royal Challengers Bangalore
      field
      normal
      0
      Royal Challengers Bangalore
      0
      7
      Mandeep Singh
      M Chinnaswamy Stadium
      JD Cloete
      PG Pathak
      NaN
      0
      4.510188
      -0.570758
      7
      57.142857
      0.000000
    
    
      492
      493
      2015
      Chandigarh
      2015-05-03
      Mumbai Indians
      Kings XI Punjab
      Mumbai Indians
      bat
      normal
      0
      Mumbai Indians
      23
      0
      LMP Simmons
      Punjab Cricket Association Stadium, Mohali
      RK Illingworth
      VA Kulkarni
      NaN
      1
      -13.615487
      -0.737024
      -1
      46.666667
      66.666667
    
    
      493
      494
      2015
      Mumbai
      2015-05-03
      Rajasthan Royals
      Delhi Daredevils
      Delhi Daredevils
      field
      normal
      0
      Rajasthan Royals
      14
      0
      AM Rahane
      Brabourne Stadium
      HDPK Dharmasena
      CB Gaffaney
      NaN
      1
      10.775508
      0.786552
      7
      60.000000
      0.000000
    
    
      494
      495
      2015
      Chennai
      2015-05-04
      Chennai Super Kings
      Royal Challengers Bangalore
      Chennai Super Kings
      bat
      normal
      0
      Chennai Super Kings
      24
      0
      SK Raina
      MA Chidambaram Stadium, Chepauk
      C Shamshuddin
      K Srinath
      NaN
      1
      4.569589
      -0.231439
      25
      58.823529
      0.000000
    
    
      496
      497
      2015
      Mumbai
      2015-05-05
      Delhi Daredevils
      Mumbai Indians
      Delhi Daredevils
      bat
      normal
      0
      Mumbai Indians
      0
      5
      Harbhajan Singh
      Wankhede Stadium
      HDPK Dharmasena
      CB Gaffaney
      NaN
      0
      -18.726971
      -0.755737
      -14
      53.333333
      -33.333333
    
    
      497
      498
      2015
      Bangalore
      2015-05-06
      Royal Challengers Bangalore
      Kings XI Punjab
      Kings XI Punjab
      field
      normal
      0
      Royal Challengers Bangalore
      138
      0
      CH Gayle
      M Chinnaswamy Stadium
      RK Illingworth
      VA Kulkarni
      NaN
      1
      -5.574733
      0.886535
      5
      35.714286
      66.666667
    
    
      499
      500
      2015
      Chennai
      2015-05-08
      Chennai Super Kings
      Mumbai Indians
      Chennai Super Kings
      bat
      normal
      0
      Mumbai Indians
      0
      6
      HH Pandya
      MA Chidambaram Stadium, Chepauk
      CB Gaffaney
      CK Nandan
      NaN
      0
      5.014283
      1.972348
      17
      52.631579
      0.000000
    
    
      500
      501
      2015
      Kolkata
      2015-05-09
      Kings XI Punjab
      Kolkata Knight Riders
      Kings XI Punjab
      bat
      normal
      0
      Kolkata Knight Riders
      0
      1
      AD Russell
      Eden Gardens
      AK Chaudhary
      HDPK Dharmasena
      NaN
      0
      6.415078
      0.254813
      -18
      40.000000
      -33.333333
    
    
      502
      503
      2015
      Mumbai
      2015-05-10
      Royal Challengers Bangalore
      Mumbai Indians
      Royal Challengers Bangalore
      bat
      normal
      0
      Royal Challengers Bangalore
      39
      0
      AB de Villiers
      Wankhede Stadium
      JD Cloete
      C Shamshuddin
      NaN
      1
      -1.343308
      2.062729
      -3
      43.750000
      -33.333333
    
    
      503
      504
      2015
      Chennai
      2015-05-10
      Chennai Super Kings
      Rajasthan Royals
      Chennai Super Kings
      bat
      normal
      0
      Chennai Super Kings
      12
      0
      RA Jadeja
      MA Chidambaram Stadium, Chepauk
      M Erasmus
      CK Nandan
      NaN
      1
      -5.738591
      -0.046456
      16
      62.500000
      33.333333
    
    
      505
      506
      2015
      Raipur
      2015-05-12
      Chennai Super Kings
      Delhi Daredevils
      Chennai Super Kings
      bat
      normal
      0
      Delhi Daredevils
      0
      6
      Z Khan
      Shaheed Veer Narayan Singh International Stadium
      RK Illingworth
      VA Kulkarni
      NaN
      0
      6.941454
      1.678318
      28
      73.333333
      33.333333
    
    
      506
      507
      2015
      Chandigarh
      2015-05-13
      Kings XI Punjab
      Royal Challengers Bangalore
      Royal Challengers Bangalore
      field
      normal
      0
      Kings XI Punjab
      22
      0
      AR Patel
      Punjab Cricket Association Stadium, Mohali
      JD Cloete
      C Shamshuddin
      NaN
      1
      5.622383
      -1.324729
      -16
      60.000000
      -66.666667
    
    
      507
      508
      2015
      Mumbai
      2015-05-14
      Mumbai Indians
      Kolkata Knight Riders
      Kolkata Knight Riders
      field
      normal
      0
      Mumbai Indians
      5
      0
      HH Pandya
      Wankhede Stadium
      RK Illingworth
      VA Kulkarni
      NaN
      1
      -0.677689
      -0.313345
      -11
      66.666667
      33.333333
    
    
      509
      510
      2015
      Chandigarh
      2015-05-16
      Kings XI Punjab
      Chennai Super Kings
      Kings XI Punjab
      bat
      normal
      0
      Chennai Super Kings
      0
      7
      P Negi
      Punjab Cricket Association Stadium, Mohali
      CK Nandan
      C Shamshuddin
      NaN
      0
      -0.716536
      1.824407
      -33
      42.857143
      0.000000
    
    
      510
      511
      2015
      Mumbai
      2015-05-16
      Rajasthan Royals
      Kolkata Knight Riders
      Rajasthan Royals
      bat
      normal
      0
      Rajasthan Royals
      9
      0
      SR Watson
      Brabourne Stadium
      RM Deshpande
      RK Illingworth
      NaN
      1
      -3.303823
      -1.181868
      -16
      50.000000
      0.000000
    
    
      513
      514
      2015
      Mumbai
      2015-05-19
      Mumbai Indians
      Chennai Super Kings
      Mumbai Indians
      bat
      normal
      0
      Mumbai Indians
      25
      0
      KA Pollard
      Wankhede Stadium
      HDPK Dharmasena
      RK Illingworth
      NaN
      1
      6.315981
      -0.617777
      -24
      50.000000
      0.000000
    
    
      514
      515
      2015
      Pune
      2015-05-20
      Royal Challengers Bangalore
      Rajasthan Royals
      Royal Challengers Bangalore
      bat
      normal
      0
      Royal Challengers Bangalore
      71
      0
      AB de Villiers
      Maharashtra Cricket Association Stadium
      AK Chaudhary
      C Shamshuddin
      NaN
      1
      -2.200375
      0.969143
      5
      50.000000
      0.000000
    
    
      515
      516
      2015
      Ranchi
      2015-05-22
      Royal Challengers Bangalore
      Chennai Super Kings
      Chennai Super Kings
      field
      normal
      0
      Chennai Super Kings
      0
      3
      A Nehra
      JSCA International Stadium Complex
      AK Chaudhary
      CB Gaffaney
      NaN
      0
      -0.521025
      1.039181
      -23
      38.888889
      33.333333
    
    
      516
      517
      2015
      Kolkata
      2015-05-24
      Mumbai Indians
      Chennai Super Kings
      Chennai Super Kings
      field
      normal
      0
      Mumbai Indians
      41
      0
      RG Sharma
      Eden Gardens
      HDPK Dharmasena
      RK Illingworth
      NaN
      1
      -1.575550
      -1.707931
      -24
      52.380952
      0.000000

Visualisation



In [73]:

    
#Graph for Strike Rate  
matches.boxplot(column = 'Avg_SR_Difference', by='team1Winning', showfliers= False)









    Out[73]:





<matplotlib.axes._subplots.AxesSubplot at 0x1240fe590>



In [74]:

    
#Graph for WPR Difference
matches.boxplot(column = 'Avg_WPR_Difference', by='team1Winning', showfliers= False)









    Out[74]:





<matplotlib.axes._subplots.AxesSubplot at 0x11d0824d0>



In [75]:

    
# Graph for MVP Difference
matches.boxplot(column = 'Total_MVP_Difference', by='team1Winning', showfliers= False)









    Out[75]:





<matplotlib.axes._subplots.AxesSubplot at 0x11d0ad590>



In [76]:

    
#Graph for Previous encounters Win Percentage of Team #1
matches.boxplot(column = 'Prev_Enc_Team1_WinPerc', by='team1Winning', showfliers= False)









    Out[76]:





<matplotlib.axes._subplots.AxesSubplot at 0x11d418850>



In [ ]:

    
# Graph for Recent form(Win Percentage in the same season)
matches.boxplot(column = 'Total_RF_Difference', by='team1Winning', showfliers= False)

Predictions for the data



In [77]:

    
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.cross_validation import train_test_split
from sklearn import metrics
from patsy import dmatrices



In [78]:

    
y, X = dmatrices('team1Winning ~ 0 + Avg_SR_Difference + Avg_WPR_Difference + Total_MVP_Difference + Prev_Enc_Team1_WinPerc + \
                  Total_RF_Difference', matches, return_type="dataframe")
y_arr = np.ravel(y)

Training and testing on Entire Data



In [79]:

    
# instantiate a logistic regression model, and fit with X and y
model = LogisticRegression()
model = model.fit(X, y_arr)
# check the accuracy on the training set
print "Accuracy is", model.score(X, y_arr)*100, "%"









    



Accuracy is 58.1039755352 %

Splitting train and test using train_test_split



In [80]:

    
# evaluate the model by splitting into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y_arr, random_state = 0)



In [81]:

    
# Logistic Regression on train_test_split
model2 = LogisticRegression()
model2.fit(X_train, y_train)
# predict class labels for the test set
predicted = model2.predict(X_test)
# generate evaluation metrics
print "Accuracy is ", metrics.accuracy_score(y_test, predicted)*100, "%"









    



Accuracy is  58.5365853659 %



In [82]:

    
# KNN Classification on train_test_split
k_range = list(range(1, 61))
k_score = []
for k in k_range:
    knn = KNeighborsClassifier(n_neighbors = k)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    k_score.append(metrics.accuracy_score(y_test, y_pred))
plt.plot(k_range, k_score)









    Out[82]:





[<matplotlib.lines.Line2D at 0x11d87f990>]



In [83]:

    
# Best values of k in train_test_split
knn = KNeighborsClassifier(n_neighbors = 50)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print "Accuracy is ", metrics.accuracy_score(y_test, y_pred)*100, "%"









    



Accuracy is  65.8536585366 %

Splitting Training Set (2008-2013) and Test Set (2013-2015) based on Seasons



In [84]:

    
#Splitting
X_timetrain = X.loc[X.index < 398]
Y_timetrain = y.loc[y.index < 398]
Y_timetrain_arr = np.ravel(Y_timetrain)
X_timetest = X.loc[X.index >= 398]
Y_timetest = y.loc[y.index >= 398]
Y_timetest_arr = np.ravel(Y_timetest)



In [85]:

    
# Logistic Regression on time-based split sets
model3 = LogisticRegression()
model3.fit(X_timetrain, Y_timetrain_arr)
timepredicted = model3.predict(X_timetest)
print "Accuracy is ", metrics.accuracy_score(Y_timetest_arr, timepredicted)*100, "%"









    



Accuracy is  51.724137931 %



In [86]:

    
# KNN Classification on time-based split sets
k_range = list(range(1, 61))
k_score = []
for k in k_range:
    knn = KNeighborsClassifier(n_neighbors = k)
    knn.fit(X_timetrain, Y_timetrain_arr)
    y_pred = knn.predict(X_timetest)
    k_score.append(metrics.accuracy_score(Y_timetest_arr, y_pred))
plt.plot(k_range, k_score)









    Out[86]:





[<matplotlib.lines.Line2D at 0x11d9d5d90>]



In [87]:

    
# Best values of k in time-based split data
knn1 = KNeighborsClassifier(n_neighbors = 31)
knn1.fit(X_timetrain, Y_timetrain_arr)
y_pred = knn1.predict(X_timetest)
print "Accuracy is ", metrics.accuracy_score(Y_timetest_arr, y_pred)*100, "%"









    



Accuracy is  63.2183908046 %

Support Vector Machines



In [88]:

    
clf = svm.SVC(gamma=0.001, C=10)
clf.fit(X_timetrain, Y_timetrain_arr)
clf_pred = clf.predict(X_timetest)
print "Accuracy is ", metrics.accuracy_score(Y_timetest_arr, clf_pred)*100, "%"









    



Accuracy is  48.275862069 %

Random Forests



In [89]:

    
rfc = RandomForestClassifier(n_jobs = -1, random_state = 1)
rfc.fit(X_timetrain, Y_timetrain_arr)
rfc_pred = rfc.predict(X_timetest)
print "Accuracy is ", metrics.accuracy_score(Y_timetest_arr, rfc_pred)*100, "%"









    



Accuracy is  47.1264367816 %



In [90]:

    
fi = zip(X.columns, rfc.feature_importances_)
print "Feature Importance according to Random Forests Model\n"
for i in fi:
    print i[0], ":", i[1]









    



Feature Importance according to Random Forests Model

Avg_SR_Difference : 0.303135787959
Avg_WPR_Difference : 0.280738074933
Total_MVP_Difference : 0.172786216394
Prev_Enc_Team1_WinPerc : 0.13956540915
Total_RF_Difference : 0.103774511565

Naive Bayes Classifier



In [91]:

    
gclf = GaussianNB()
gclf.fit(X_timetrain, Y_timetrain_arr)
gclf_pred = gclf.predict(X_timetest)
print "Accuracy is ", metrics.accuracy_score(Y_timetest_arr, gclf_pred) *100, "%"









    



Accuracy is  50.5747126437 %



In [ ]:



In [ ]:



In [ ]:

	id	season	city	date	team1	team2	toss_winner	toss_decision	result	winner	win_by_runs	win_by_wickets	player_of_match	venue	umpire1	umpire2	umpire3
0	1	2008	Bangalore	2008-04-18	Kolkata Knight Riders	Royal Challengers Bangalore	Royal Challengers Bangalore	field	normal	Kolkata Knight Riders	140	0	BB McCullum	M Chinnaswamy Stadium	Asad Rauf	RE Koertzen	NaN
1	2	2008	Chandigarh	2008-04-19	Chennai Super Kings	Kings XI Punjab	Chennai Super Kings	bat	normal	Chennai Super Kings	33	0	MEK Hussey	Punjab Cricket Association Stadium, Mohali	MR Benson	SL Shastri	NaN
2	3	2008	Delhi	2008-04-19	Rajasthan Royals	Delhi Daredevils	Rajasthan Royals	bat	normal	Delhi Daredevils	0	9	MF Maharoof	Feroz Shah Kotla	Aleem Dar	GA Pratapkumar	NaN
3	4	2008	Mumbai	2008-04-20	Mumbai Indians	Royal Challengers Bangalore	Mumbai Indians	bat	normal	Royal Challengers Bangalore	0	5	MV Boucher	Wankhede Stadium	SJ Davis	DJ Harper	NaN
4	5	2008	Kolkata	2008-04-20	Deccan Chargers	Kolkata Knight Riders	Deccan Chargers	bat	normal	Kolkata Knight Riders	0	5	DJ Hussey	Eden Gardens	BF Bowden	K Hariharan	NaN
5	6	2008	Jaipur	2008-04-21	Kings XI Punjab	Rajasthan Royals	Kings XI Punjab	bat	normal	Rajasthan Royals	0	6	SR Watson	Sawai Mansingh Stadium	Aleem Dar	RB Tiffin	NaN
6	7	2008	Hyderabad	2008-04-22	Deccan Chargers	Delhi Daredevils	Deccan Chargers	bat	normal	Delhi Daredevils	0	9	V Sehwag	Rajiv Gandhi International Stadium, Uppal	IL Howell	AM Saheba	NaN
7	8	2008	Chennai	2008-04-23	Chennai Super Kings	Mumbai Indians	Mumbai Indians	field	normal	Chennai Super Kings	6	0	ML Hayden	MA Chidambaram Stadium, Chepauk	DJ Harper	GA Pratapkumar	NaN
8	9	2008	Hyderabad	2008-04-24	Deccan Chargers	Rajasthan Royals	Rajasthan Royals	field	normal	Rajasthan Royals	0	3	YK Pathan	Rajiv Gandhi International Stadium, Uppal	Asad Rauf	MR Benson	NaN
9	10	2008	Chandigarh	2008-04-25	Kings XI Punjab	Mumbai Indians	Mumbai Indians	field	normal	Kings XI Punjab	66	0	KC Sangakkara	Punjab Cricket Association Stadium, Mohali	Aleem Dar	AM Saheba	NaN

	id	season	city	date	team1	team2	toss_winner	toss_decision	result	winner	win_by_runs	win_by_wickets	player_of_match	venue	umpire1	umpire2	umpire3	team1Winning	Avg_SR_Difference	Avg_WPR_Difference	Total_MVP_Difference	Prev_Enc_Team1_WinPerc	Total_RF_Difference
489	490	2015	Mumbai	2015-05-01	Mumbai Indians	Rajasthan Royals	Rajasthan Royals	field	normal	Mumbai Indians	8	0	AT Rayudu	Wankhede Stadium	HDPK Dharmasena	CK Nandan	NaN	1	-13.536521	0.097342	5	60.000000	-33.333333
490	491	2015	Bangalore	2015-05-02	Kolkata Knight Riders	Royal Challengers Bangalore	Royal Challengers Bangalore	field	normal	Royal Challengers Bangalore	0	7	Mandeep Singh	M Chinnaswamy Stadium	JD Cloete	PG Pathak	NaN	0	4.510188	-0.570758	7	57.142857	0.000000
492	493	2015	Chandigarh	2015-05-03	Mumbai Indians	Kings XI Punjab	Mumbai Indians	bat	normal	Mumbai Indians	23	0	LMP Simmons	Punjab Cricket Association Stadium, Mohali	RK Illingworth	VA Kulkarni	NaN	1	-13.615487	-0.737024	-1	46.666667	66.666667
493	494	2015	Mumbai	2015-05-03	Rajasthan Royals	Delhi Daredevils	Delhi Daredevils	field	normal	Rajasthan Royals	14	0	AM Rahane	Brabourne Stadium	HDPK Dharmasena	CB Gaffaney	NaN	1	10.775508	0.786552	7	60.000000	0.000000
494	495	2015	Chennai	2015-05-04	Chennai Super Kings	Royal Challengers Bangalore	Chennai Super Kings	bat	normal	Chennai Super Kings	24	0	SK Raina	MA Chidambaram Stadium, Chepauk	C Shamshuddin	K Srinath	NaN	1	4.569589	-0.231439	25	58.823529	0.000000
496	497	2015	Mumbai	2015-05-05	Delhi Daredevils	Mumbai Indians	Delhi Daredevils	bat	normal	Mumbai Indians	0	5	Harbhajan Singh	Wankhede Stadium	HDPK Dharmasena	CB Gaffaney	NaN	0	-18.726971	-0.755737	-14	53.333333	-33.333333
497	498	2015	Bangalore	2015-05-06	Royal Challengers Bangalore	Kings XI Punjab	Kings XI Punjab	field	normal	Royal Challengers Bangalore	138	0	CH Gayle	M Chinnaswamy Stadium	RK Illingworth	VA Kulkarni	NaN	1	-5.574733	0.886535	5	35.714286	66.666667
499	500	2015	Chennai	2015-05-08	Chennai Super Kings	Mumbai Indians	Chennai Super Kings	bat	normal	Mumbai Indians	0	6	HH Pandya	MA Chidambaram Stadium, Chepauk	CB Gaffaney	CK Nandan	NaN	0	5.014283	1.972348	17	52.631579	0.000000
500	501	2015	Kolkata	2015-05-09	Kings XI Punjab	Kolkata Knight Riders	Kings XI Punjab	bat	normal	Kolkata Knight Riders	0	1	AD Russell	Eden Gardens	AK Chaudhary	HDPK Dharmasena	NaN	0	6.415078	0.254813	-18	40.000000	-33.333333
502	503	2015	Mumbai	2015-05-10	Royal Challengers Bangalore	Mumbai Indians	Royal Challengers Bangalore	bat	normal	Royal Challengers Bangalore	39	0	AB de Villiers	Wankhede Stadium	JD Cloete	C Shamshuddin	NaN	1	-1.343308	2.062729	-3	43.750000	-33.333333
503	504	2015	Chennai	2015-05-10	Chennai Super Kings	Rajasthan Royals	Chennai Super Kings	bat	normal	Chennai Super Kings	12	0	RA Jadeja	MA Chidambaram Stadium, Chepauk	M Erasmus	CK Nandan	NaN	1	-5.738591	-0.046456	16	62.500000	33.333333
505	506	2015	Raipur	2015-05-12	Chennai Super Kings	Delhi Daredevils	Chennai Super Kings	bat	normal	Delhi Daredevils	0	6	Z Khan	Shaheed Veer Narayan Singh International Stadium	RK Illingworth	VA Kulkarni	NaN	0	6.941454	1.678318	28	73.333333	33.333333
506	507	2015	Chandigarh	2015-05-13	Kings XI Punjab	Royal Challengers Bangalore	Royal Challengers Bangalore	field	normal	Kings XI Punjab	22	0	AR Patel	Punjab Cricket Association Stadium, Mohali	JD Cloete	C Shamshuddin	NaN	1	5.622383	-1.324729	-16	60.000000	-66.666667
507	508	2015	Mumbai	2015-05-14	Mumbai Indians	Kolkata Knight Riders	Kolkata Knight Riders	field	normal	Mumbai Indians	5	0	HH Pandya	Wankhede Stadium	RK Illingworth	VA Kulkarni	NaN	1	-0.677689	-0.313345	-11	66.666667	33.333333
509	510	2015	Chandigarh	2015-05-16	Kings XI Punjab	Chennai Super Kings	Kings XI Punjab	bat	normal	Chennai Super Kings	0	7	P Negi	Punjab Cricket Association Stadium, Mohali	CK Nandan	C Shamshuddin	NaN	0	-0.716536	1.824407	-33	42.857143	0.000000
510	511	2015	Mumbai	2015-05-16	Rajasthan Royals	Kolkata Knight Riders	Rajasthan Royals	bat	normal	Rajasthan Royals	9	0	SR Watson	Brabourne Stadium	RM Deshpande	RK Illingworth	NaN	1	-3.303823	-1.181868	-16	50.000000	0.000000
513	514	2015	Mumbai	2015-05-19	Mumbai Indians	Chennai Super Kings	Mumbai Indians	bat	normal	Mumbai Indians	25	0	KA Pollard	Wankhede Stadium	HDPK Dharmasena	RK Illingworth	NaN	1	6.315981	-0.617777	-24	50.000000	0.000000
514	515	2015	Pune	2015-05-20	Royal Challengers Bangalore	Rajasthan Royals	Royal Challengers Bangalore	bat	normal	Royal Challengers Bangalore	71	0	AB de Villiers	Maharashtra Cricket Association Stadium	AK Chaudhary	C Shamshuddin	NaN	1	-2.200375	0.969143	5	50.000000	0.000000
515	516	2015	Ranchi	2015-05-22	Royal Challengers Bangalore	Chennai Super Kings	Chennai Super Kings	field	normal	Chennai Super Kings	0	3	A Nehra	JSCA International Stadium Complex	AK Chaudhary	CB Gaffaney	NaN	0	-0.521025	1.039181	-23	38.888889	33.333333
516	517	2015	Kolkata	2015-05-24	Mumbai Indians	Chennai Super Kings	Chennai Super Kings	field	normal	Mumbai Indians	41	0	RG Sharma	Eden Gardens	HDPK Dharmasena	RK Illingworth	NaN	1	-1.575550	-1.707931	-24	52.380952	0.000000